Skip to content

Introduce multi-period Account data type and use it for MultiBalanceReport and BudgetReport. #2360

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 20 commits into
base: master
Choose a base branch
from

Conversation

Xitian9
Copy link
Collaborator

@Xitian9 Xitian9 commented Mar 26, 2025

This rejigs the MultiBalanceReport internals to use an enhanced Account data type to save the values. This has a few effects:

  • A small speed improvement on large journal files with interval reporting, due to processing the posting list in one pass.
  • Simplified program logic, as there was a lot of code to convert back and forth between list and tree representations. This can now be removed.
  • Ability to merge Account means that BudgetReport can be simplified.

There are some small changes in behaviour with respect to budget reports, where it looked like some behaviour was implemented to work around needing to get the budget and actuals into the same shape so they could be merged. This is no longer necessary, but may still be desired for other reasons.

Let me know your thoughts.

@Xitian9 Xitian9 force-pushed the multiaccount branch 5 times, most recently from d8d6312 to 04cb729 Compare March 29, 2025 11:47
@simonmichael simonmichael added needs-review To unblock: needs more code/docs/design review by someone performance Anything performance-related (run time, memory usage, disk space..) labels Apr 2, 2025
Copy link
Owner

@simonmichael simonmichael left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Initial comments.


# ** 16. balance --flat --empty does not display accounts which have not been
# seen, even if they're implied, but does show accounts that have been seen
# with 0 balance.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what this means in userese ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means that if there have never been any postings to assets, then we shouldn't display a value for the assets account, even with --empty. On the other hand, we should show assets:bank:checking, since there have been postings to that account.

This was the case before, but turned out to be a non-trivial thing to maintain in the refactoring, so I added a test.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the test description. Let me know if it's clearer.

@@ -50,7 +50,6 @@ Budget performance in 2016-12-01..2016-12-03:
|| 2016-12-01 2016-12-02 2016-12-03
==================++==============================================================
assets:cash || $-10 [ 40% of $-25] $-14 [56% of $-25] $-51 [204% of $-25]
expenses || $10 [ 40% of $25] $14 [56% of $25] $51 [204% of $25]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another behaviour change, worth noting with a ! in message.

The parent "expenses" account is not shown, because there's no explicit budget goal for it, and because we're in list mode ? So if we want to see aggregated budget performance, tree mode will be needed. Ok I guess.

Why is it still shown in the previous test ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me look into this.

Copy link
Collaborator Author

@Xitian9 Xitian9 Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the reason is as follows:

Unless called with -E, then a budget report will count any unbudgeted subaccounts against their earliest budgeted parent. So both expenses:cab and expenses:movies are rolled up to expenses. Even though expenses doesn't have a budget itself, it gets the sum of the budgets of its subaccounts.

I'm not sure how I feel about this behaviour, but I think changing it is out of scope for this PR.

data AccountBalances a = AccountBalances {
abhistorical :: a -- ^ historical balance information
,abdatemap :: IM.IntMap a -- ^ balance information associated to a start day
} deriving (Eq, Functor, Generic)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for using a type parameter ? Do we truly need it ?

How does the "start day" map, with Int keys I assume, work here ?

Copy link
Owner

@simonmichael simonmichael Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for using a type parameter ? Do we truly need it ?

For budget report, I guess. It hurts code comprehensability a bit.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the main use is for the budget report. But I think it also simplifies things a bit by exposing the functor, foldable, and traversable interfaces for AccountBalances, saving having to write out a lot of boilerplate to perform tasks that re-implement that functionality in a monomorphic container.

@simonmichael simonmichael added balance balancesheet incomestatement cashflow budget The balance command's --budget report balancesheetequity and removed needs-review To unblock: needs more code/docs/design review by someone labels Apr 3, 2025
@simonmichael
Copy link
Owner

A small speed improvement on large journal files

Could we quantify that a little more - eg "balance reports are 1% faster with 1k txns, 5% faster with 10k txns" ?

@simonmichael
Copy link
Owner

Also I wonder if there's any memory impact, stats might be a quick way to check (it runs balance reports probably)

@Xitian9 Xitian9 force-pushed the multiaccount branch 3 times, most recently from e8a2ca8 to 509e5bc Compare April 24, 2025 10:01
Xitian9 added 6 commits April 25, 2025 09:39
This upgrades Account to enable it to do the hard work in
MultiBalanceReport, but does not use the new functionality just yet.
It continues to function as before by only using the "abhistorical"
value.
Ensure that implied accounts with no postings are not displayed, but
accounts with zero balance and actual postings are.
@Xitian9 Xitian9 force-pushed the multiaccount branch 2 times, most recently from 7223a5c to 00e02a2 Compare April 25, 2025 00:35
Xitian9 added 14 commits April 25, 2025 11:03
Rephrase everything in terms of boringness to make for a clearer logical
flow.
This removes the type alias Account, and replaces it with the
fully-qualified name Account AccountBalance. This breaks some backwards
compatibility, but that was already broken by the change of Account type
constructor in any case. This simplifies the interface.
Rename applyAccountBalance to mapAccountBalance.
mergeWithKey can create corrupt output if its inputs don't satisfy
certain conditions. We restrict the domain here to only those cases
where it is guaranteed safe. This still covers all the cases that we
need.
This keeps Hledger.Data.AccountBalance and Hledger.Data.AccountBalances
separate.
@Xitian9
Copy link
Collaborator Author

Xitian9 commented Apr 25, 2025

Here is the benchmarking. Marginal change for small journals, but about 5% time savings for the 100k journal, and roughly comparable for my real-life journal (21k transactions, 796 accounts of depth 7).

Running 6 tests 5 times with 2 executables at 2025-04-25 14:16:10 AEST:

Best times:
+--------------------------------------------------------------------++------------------+------------------------+
|                                                                    || ./hledger-master | ./hledger-multiaccount |
+====================================================================++==================+========================+
| -f examples/10ktxns-1kaccts.journal balance                        ||             0.82 |                   0.81 |
| -f examples/1ktxns-1kaccts.journal balance --weekly                ||             0.64 |                   0.64 |
| -f examples/10ktxns-1kaccts.journal balance --weekly               ||             7.66 |                   7.57 |
| -f examples/100ktxns-1kaccts.journal balance --yearly              ||            11.35 |                  10.43 |
| balance --value=end @/home/myname/expenses-report.args             ||             1.40 |                   1.31 |
| balance --layout=tidy --daily @/home/myname/assetsliabilities.args ||             1.49 |                   1.46 |
+--------------------------------------------------------------------++------------------+------------------------+

hledger stats shows no difference in memory usage, but that's not too surprising as it uses a single-period balance report in the backend. The changes will only affect multi-period balance reports.

@Xitian9
Copy link
Collaborator Author

Xitian9 commented Apr 25, 2025

It looks like memory use is a bit higher with the new code. Lower heap use, but higher maximum residency.

$ ./hledger-master-prof -f examples/100ktxns-1kaccts.journal balance --yearly +RTS -s > /dev/null
  90,879,905,384 bytes allocated in the heap
   8,694,480,976 bytes copied during GC
     559,648,704 bytes maximum residency (15 sample(s))
      13,950,024 bytes maximum slop
            1573 MiB total memory in use (0 MiB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0     21878 colls,     0 par    2.543s   2.555s     0.0001s    0.0034s
  Gen  1        15 colls,     0 par    2.087s   2.092s     0.1395s    0.2950s

  TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.003s  (  0.003s elapsed)
  MUT     time   57.091s  ( 57.175s elapsed)
  GC      time    4.630s  (  4.647s elapsed)
  RP      time    0.000s  (  0.000s elapsed)
  PROF    time    0.000s  (  0.000s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time   61.725s  ( 61.825s elapsed)

  Alloc rate    1,591,840,064 bytes per MUT second

  Productivity  92.5% of total user, 92.5% of total elapsed

$ ./hledger-multiaccount-prof -f examples/100ktxns-1kaccts.journal balance --yearly +RTS -s > /dev/null

  87,899,201,024 bytes allocated in the heap
   8,609,724,032 bytes copied during GC
     603,841,256 bytes maximum residency (15 sample(s))
      13,949,544 bytes maximum slop
            1688 MiB total memory in use (0 MiB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0     21163 colls,     0 par    2.438s   2.450s     0.0001s    0.0034s
  Gen  1        15 colls,     0 par    2.183s   2.188s     0.1459s    0.3211s

  TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.003s  (  0.003s elapsed)
  MUT     time   56.039s  ( 56.113s elapsed)
  GC      time    4.622s  (  4.638s elapsed)
  RP      time    0.000s  (  0.000s elapsed)
  PROF    time    0.000s  (  0.000s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time   60.664s  ( 60.754s elapsed)

  Alloc rate    1,568,534,957 bytes per MUT second

  Productivity  92.4% of total user, 92.4% of total elapsed

@Xitian9
Copy link
Collaborator Author

Xitian9 commented Apr 25, 2025

I think I've responded to all comments. Let me know if you want to discuss further.

accountMap = processPostings ps

processPostings :: [Posting] -> HM.HashMap AccountName (AccountBalances AccountBalance)
processPostings = foldl' (flip processAccountName) mempty
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a question about whether we get better performance with this as foldl' or foldr. It seems that foldl' is slightly faster, while foldr has better memory usage.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the time and memory usage when using the foldr version. I wonder if foldr is the winner here.

Running 6 tests 5 times with 3 executables at 2025-04-25 19:44:53 AEST:

Best times:
+--------------------------------------------------------------------++------------------+------------------------+------------------------------+
|                                                                    || ./hledger-master | ./hledger-multiaccount | ./hledger-multiaccount-foldr |
+====================================================================++==================+========================+==============================+
| -f examples/10ktxns-1kaccts.journal balance                        ||             0.81 |                   0.80 |                         0.81 |
| -f examples/1ktxns-1kaccts.journal balance --weekly                ||             0.68 |                   0.65 |                         0.64 |
| -f examples/10ktxns-1kaccts.journal balance --weekly               ||             7.91 |                   7.63 |                         7.94 |
| -f examples/100ktxns-1kaccts.journal balance --yearly              ||            11.29 |                  10.92 |                        10.84 |
| balance --value=end @/home/myname/expenses-report.args             ||             1.39 |                   1.32 |                         1.32 |
| balance --layout=tidy --daily @/home/myname/assetsliabilities.args ||             1.49 |                   1.46 |                         1.46 |
+--------------------------------------------------------------------++------------------+------------------------+------------------------------+
$ ./hledger-multiaccount-foldr-prof -f examples/100ktxns-1kaccts.journal balance --yearly +RTS -s > /dev/null
  87,949,386,664 bytes allocated in the heap
  8,323,439,592 bytes copied during GC
    558,243,136 bytes maximum residency (15 sample(s))
     13,941,192 bytes maximum slop
           1551 MiB total memory in use (0 MiB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
 Gen  0     21167 colls,     0 par    2.663s   2.680s     0.0001s    0.0100s
 Gen  1        15 colls,     0 par    2.385s   2.394s     0.1596s    0.5332s

 TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)

 SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

 INIT    time    0.002s  (  0.002s elapsed)
 MUT     time   56.837s  ( 56.914s elapsed)
 GC      time    5.048s  (  5.074s elapsed)
 RP      time    0.000s  (  0.000s elapsed)
 PROF    time    0.000s  (  0.000s elapsed)
 EXIT    time    0.000s  (  0.000s elapsed)
 Total   time   61.888s  ( 61.990s elapsed)

 Alloc rate    1,547,396,211 bytes per MUT second

 Productivity  91.8% of total user, 91.8% of total elapsed

Copy link
Collaborator Author

@Xitian9 Xitian9 Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think I messed up the memory analysis by using profiled versions of the executables. The answer is less dramatic for the normal versions. I've included them here.

$ ./hledger-master -f examples/100ktxns-1kaccts.journal balance --yearly +RTS -s
  43,526,689,224 bytes allocated in the heap
   4,818,585,000 bytes copied during GC
     376,952,024 bytes maximum residency (13 sample(s))
       3,414,824 bytes maximum slop
            1041 MiB total memory in use (0 MiB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0     10559 colls,     0 par    1.651s   1.660s     0.0002s    0.0043s
  Gen  1        13 colls,     0 par    1.711s   1.718s     0.1322s    0.3766s

  TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.002s  (  0.001s elapsed)
  MUT     time    7.573s  (  7.582s elapsed)
  GC      time    3.362s  (  3.378s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time   10.937s  ( 10.961s elapsed)

  Alloc rate    5,747,632,285 bytes per MUT second

  Productivity  69.2% of total user, 69.2% of total elapsed

$ ./hledger-multiaccount -f examples/100ktxns-1kaccts.journal balance --yearly +RTS -s
  41,742,275,096 bytes allocated in the heap
   4,600,175,152 bytes copied during GC
     388,933,464 bytes maximum residency (13 sample(s))
       2,396,632 bytes maximum slop
             999 MiB total memory in use (0 MiB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0     10128 colls,     0 par    1.628s   1.637s     0.0002s    0.0034s
  Gen  1        13 colls,     0 par    1.494s   1.499s     0.1153s    0.2873s

  TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    7.544s  (  7.561s elapsed)
  GC      time    3.122s  (  3.136s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time   10.667s  ( 10.698s elapsed)

  Alloc rate    5,533,021,721 bytes per MUT second

  Productivity  70.7% of total user, 70.7% of total elapsed

$ ./hledger-multiaccount-foldr -f examples/100ktxns-1kaccts.journal balance --yearly +RTS -s
  41,751,866,552 bytes allocated in the heap
   4,611,401,952 bytes copied during GC
     381,899,736 bytes maximum residency (13 sample(s))
       3,317,352 bytes maximum slop
            1014 MiB total memory in use (0 MiB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0     10129 colls,     0 par    1.528s   1.537s     0.0002s    0.0028s
  Gen  1        13 colls,     0 par    1.503s   1.507s     0.1159s    0.3262s

  TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.000s  (  0.000s elapsed)
  MUT     time    7.356s  (  7.368s elapsed)
  GC      time    3.031s  (  3.044s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time   10.387s  ( 10.412s elapsed)

  Alloc rate    5,676,066,670 bytes per MUT second

  Productivity  70.8% of total user, 70.8% of total elapsed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
balance balancesheet balancesheetequity budget The balance command's --budget report cashflow incomestatement performance Anything performance-related (run time, memory usage, disk space..)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants